[6.1810][code] xv6 的 Processes (二)

xv6-riscv

wtommy_fdgkhdkgh 2026-06-01 21:17:16 ‧ 509 瀏覽

分享至

系列文章 : [6.1810] 跟著 MIT 6.1810 學習基礎作業系統觀念

大綱

kernel/proc.c/sched
kernel/proc.c/yield
kernel/proc.c/scheduler
kernel/proc.c/cpuid
kernel/proc.c/mycpu
kernel/proc.c/myproc
kernel/proc.c/kexit
kernel/proc.c/kkill
kernel/proc.c/kwait

kernel/proc.c/sched

// Switch to scheduler.  Must hold only p->lock
// and have changed proc->state. Saves and restores
// intena because intena is a property of this
// kernel thread, not this CPU. It should
// be proc->intena and proc->noff, but that would
// break in the few places where a lock is held but
// there's no process.
void
sched(void)
{
  int intena;
  struct proc *p = myproc();

  if(!holding(&p->lock))
    panic("sched p->lock");
  if(mycpu()->noff != 1)
    panic("sched locks");
  if(p->state == RUNNING)
    panic("sched RUNNING");
  if(intr_get())
    panic("sched interruptible");

  intena = mycpu()->intena;
  swtch(&p->context, &mycpu()->context);
  mycpu()->intena = intena;
}

這個 function 會讓我們從某個 process->context 切換到該 CPU 的 scheduler thread 的 context。
if(!holding(&p->lock))
- 在呼叫這個 function 之前，我們必須要先拿到 p->lock
- 假如沒有先拿 lock 的話
  - 假如有兩個 CPU 同時在 scheduler thread
  - 兩個 scheduler thread 可能同時挑選到某一個 process
  - 導致這個 process 同時在兩個 CPU 上執行，進入一個奇怪的狀態。
if(mycpu()->noff != 1)
- Q: 為什麼要檢查 noff == 1 ??
  - A: 確保該 CPU 只有 p->lock。
- 假如 noff > 1 會發生什麼事情呢 ? 表示在 context switch 發生的時候，要被 swap-out 的 process-A 握有其他 spinlock，然後 swap-in 的 process-B 想去拿取這個 spinlock，就會直接發生 deadlock，process-B 會永遠 spin 在那邊，苦苦等待已經沉睡的 process-A。
if(p->state == RUNNING)
- 呼叫這個 function 的時候，該 process 的狀態不該是 RUNNING，因為這邊已經要把 CPU 交接出去了。
if(intr_get())
- 在進行 context switch 的時候，要先把 interrupt 關掉，不然在 context switch 的過程中會發生 interrupt。
intena = mycpu()->intena;
- 因為 intena ( interrupt enable ) 是 process 的狀態，並不是 CPU 的狀態，這邊需要把這個訊息記錄在該 process 的 stack 裡面。
- intena 意指第一個 push_off 的時候，是否有 enable interrupt，這方便我們在最後一個 pop_off 的時候，可以回復到正確的狀態。
swtch(&p->context, &mycpu()->context);
- 從當前 process 的 context，context switch 到該 CPU 的 scheduler thread。
mycpu()->intena = intena;
- 把 intena 從該 process 的 stack 存回到 per-CPU-struct。

在這邊，對於 p->lock 的使用也適合討論一下，因為這裡稀有的會讓 process-A 上鎖，並讓其他 thread ( CPU-scheduler-thread ) 解鎖。

CPU-scheduler-thread 在挑選到 process-A 之前，會先取 process-A->lock。避免其他 CPU 挑選同樣的 process-A。(link)
挑選到 process-A 之後，會釋放掉 process-A->lock (link)
process-A 做自己的事情，直到要被 swap-out
process-A 在呼叫 sched 之前，要把所有的鎖釋放掉，並只拿 process-A->lock 這一個鎖。(link)
process-A 執行 sched，並切換到 CPU-scheduler-thread (link)
CPU-scheduler-thread 會把 process-A->lock 釋放掉。(link)

kernel/proc.c/yield

// Give up the CPU for one scheduling round.
void
yield(void)
{
  struct proc *p = myproc();
  acquire(&p->lock);
  p->state = RUNNABLE;
  sched();
  release(&p->lock);
}

這裡會讓某個 process 放棄 CPU，讓其他 process 有機會被執行
p->state = RUNNABLE;
- 在交出 CPU 之前，會先把自己設為 RUNNABLE
sched()
- 正式交出 CPU 使用權
release(&p->lock);
- 下次再被 CPU-scheduler-thread 挑選中時，會回到這裡。

kernel/proc.c/scheduler

// Per-CPU process scheduler.
// Each CPU calls scheduler() after setting itself up.
// Scheduler never returns.  It loops, doing:
//  - choose a process to run.
//  - swtch to start running that process.
//  - eventually that process transfers control
//    via swtch back to the scheduler.
void
scheduler(void)
{
  struct proc *p;
  struct cpu *c = mycpu();

  c->proc = 0;

c->proc = 0 : 代表目前 CPU 沒有運行任何的 process，目前運行的是 CPU-scheduler-thread
假如在 CPU-scheduler-thread 裡面發生了 interrupt ( e.g. timer interrupt )，interrupt handler 會看到 myproc() == 0，並跳過 preemption ( yield )。

  for(;;){
    // The most recent process to run may have had interrupts
    // turned off; enable them to avoid a deadlock if all
    // processes are waiting. Then turn them back off
    // to avoid a possible race between an interrupt
    // and wfi.
    intr_on();
    intr_off();

這裡很神奇，馬上把 interrupt 打開 ( intr_on ) 爾後又馬上把 interrupt 給關掉 ( intr_off )，這兩個 function 是為了不同的目的
- intr_on : 為了解決 deadlock
- intr_off : 為了避免 Lost Wake-up Race Condition
intr_on ( 為了解決 deadlock )
- 假如 CPU 是從 wfi ( wait for interrupt ) 的睡眠中，因為 interrupt 而醒來的話，需要把 interrupt 打開，並讓 CPU 去處理 pending 的 interrupt
- 假如這邊 沒有做這一件事，會發生什麼事情呢 ?。
  - 已知，當 process 呼叫 sched 的時候，會需要先取用自己的 struct-proc->lock ( 使用 acquire function )，並且 disable interrupt。
  - 所以到達這裡的時候，process 都是關掉 interrupt 的狀態。
  - 假如所有 process 都在等待著某個事件 ( e.g. disk-read ) ，並因此而沉睡 ( sleep )，且這時候 所有 process 都關掉了 interrupt，這就變成，這個系統永遠也不會有 RUNNABLE 的 process，整個系統都會卡住，變成 deadlock !!!
  - 想解決這個問題，就需要在這裡將 interrupt 打開，處理來自硬體的需求，並將等待硬體結果的 process 從沉睡中喚醒 !
intr_off ( 為了避免 Lost Wake-up Race Condition )
- 假如，我們沒有 intr_off，並保持 interrupt 開啟的話，會發生什麼事情呢 ?
  - CPU-scheduler-thread 掃過整個 process table，並且發現沒有任何的 RUNNABLE process ( found == 0 )
  - 掃完之後，CPU-scheduler-thread 準備執行 wfi 陷入睡面，只是剛好在執行 wfi 之前，有個 interrupt 發生，CPU 去服務這個 interrupt ( e.g. disk finishes reading )，讓某個 process 變成 RUNNABLE。
  - 服務完 interrupt service routine 之後，CPU 回到 CPU-scheduler-thread 並執行 wfi 進入沉睡
  - 這時候會造成，明明有一個 RUNNABLE process，CPU scheduler-thread 理應要 context-switch 到任何 RUNNABLE 的 process，但這時候卻 錯過了這個 wake up ( Lost Wake-up )
  - 當我們錯過了這個 wake up 之後，只能等待其他隨機的 interrupt 來喚醒進入 wfi 的 CPU，讓我們能進入 CPU-scheduler-thread 了。
- intr_off 是怎麼解決這個問題的呢 ?
  - 只要我們先用 intr_off 關掉中斷，我們之後用 for-loop 掃過全部 process 的時候，就不會去處理中斷 ( e.g. 去 interrupt service routine 處理 disk-reading )。
  - wfi 有一個特性是，就算 interrupt 被 globally disabled 了 ( 用 intr_off 把 SSTATUS_SIE 歸 0 )，只要有 interrupt 正在 pending，CPU 仍舊會從 wfi 甦醒。
  - 以上述的例子來說
    - 假如 process-A 為了等待 disk-read 而進入 sleep，並且 disk-read 完成，發了 interrupt 給 CPU，這時候儘管 CPU 關閉了 global-interrupt，但這一發 interrupt 仍然會 pending 在 CPU，這會讓 wfi 變成 no-op。
    - 進入了 for(;;) 的下一輪，碰上了 intr_on (link)
    - 一把 global-interrupt 打開後，就會進入 interrupt-service-routine，把 disk-read 的 interrupt 處理好，並喚醒 process-A
    - 結果 : 這一發 interrupt 精準的讓 process-A 醒來，變成 RUNNABLE 狀態 ! 並不會讓 process-A 錯過這一發 interrupt 而繼續睡大覺。

    int found = 0;
    for(p = proc; p < &proc[NPROC]; p++) {
      acquire(&p->lock);

xv6-riscv 總共支援 NPROC ( default value 是 64 ) 個 process，這邊會去遍尋所有的 process，查看每個 process 的狀態，看有誰是可以被執行的。
在處理 process 之前，要先拿到 process 的鎖，以免多個 CPU 同時 context-switch 到同一個 process。

      if(p->state == RUNNABLE) {
        // Switch to chosen process.  It is the process's job
        // to release its lock and then reacquire it
        // before jumping back to us.
        p->state = RUNNING;
        c->proc = p;
        swtch(&c->context, &p->context);

p->state = RUNNING;
- 把 process 的狀態從 RUNNABLE 轉換到 RUNNING，代表 process 要開始執行了 !
c->proc = p;
- 把該 CPU 當前的 struct-proc 換成正確的 process
swtch(&c->context, &p->context);
- 從 CPU 的 scheduler-thread 切換到 process 的 context。

        // Process is done running for now.
        // It should have changed its p->state before coming back.
        c->proc = 0;
        found = 1;
      }
      release(&p->lock);
    }

c->proc = 0;
- 該 process 已經讓出 CPU 了，所以把該 CPU 的當前 process 設為 0。
found = 1;
- 表示這一輪有找到可以被執行 ( RUNNABLE ) 的 process，並有 context-switch 到 process 去執行。
release(&p->lock);
- 處理完該 process，可以釋放針對該 CPU 的鎖。

    if(found == 0) {
      // nothing to run; stop running on this core until an interrupt.
      asm volatile("wfi");
    }
  }
}

假如這一輪沒有找到任何可以運行的 process ( 沒有 RUNNABLE 的 process )，就用 wfi 進入睡眠。

kernel/proc.c/cpuid

// Must be called with interrupts disabled,
// to prevent race with process being moved
// to a different CPU.
int
cpuid()
{
  int id = r_tp();
  return id;
}

cpu id 會在 start()(link) 放到 tp register 裡面。
呼叫這個 function 的時候，需要先 disable interrupt。

kernel/proc.c/mycpu

// Return this CPU's cpu struct.
// Interrupts must be disabled.
struct cpu*
mycpu(void)
{
  int id = cpuid();
  struct cpu *c = &cpus[id];
  return c;
}

這個 function 會 return 當前所使用的 CPU 的 struct-cpu
先用 cpuid() 取得當前 cpu 的 cpu-id。
然後在 cpus 陣列裡面翻找出想找到的 struct-cpu

kernel/proc.c/

// Return the current struct proc *, or zero if none.
struct proc*
myproc(void)
{
  push_off();
  struct cpu *c = mycpu();
  struct proc *p = c->proc;
  pop_off();
  return p;
}

這個 function 會回傳該 CPU 當前所運行的 process。假如該 CPU 目前是在 CPU-scheduler-thread，沒有運行任何的 process，則這個 function 會回傳 0。
在這個 function 裡面，會先用 push_off 來關閉 interrupt，免得執行到一半，該 CPU 實際執行的 process 已經改變了，然後回傳錯的資訊。

kernel/proc.c/kexit

一個 process 的離去，不能單單只靠自己，而是需要靠 parent-process 來把資源給釋放掉。所以每個 process 必定會有一個 parent。
TODO : … 可能會有個特例，需要研究一下 initproc。

為什麼要靠別的 process 來釋放掉呢 ?
假如不靠別人的話，該怎麼釋放資源呢 ? 自己把自己的 text section release 掉嗎 ? 可是自己這個 process 的程式碼仍在運行，仍在使用著 text section，要怎麼把 text section 釋放掉呢 ?

到最後，最直覺的方式，似乎仍舊是要讓 parent process 來幫自己釋放掉資源。

// Exit the current process.  Does not return.
// An exited process remains in the zombie state
// until its parent calls wait().
void
kexit(int status)
{

status : exit code，會把它傳遞給 parent process。

  struct proc *p = myproc();

拿出當前正在執行的 process 的 struct-proc。

  if(p == initproc)
    panic("init exiting");

initproc 不可以 exit
init process 會常駐在作業系統裡面，用以 wait orphan processes。

  // Close all open files.
  for(int fd = 0; fd < NOFILE; fd++){
    if(p->ofile[fd]){
      struct file *f = p->ofile[fd];
      fileclose(f);
      p->ofile[fd] = 0;
    }
  }

遍尋該 process 的 file-descriptor-table，在該 table 裡面的 struct-file *，只要不是 0 的話，都代表該 process 所開啟的檔案。
把所有檔案通通都關掉。

  begin_op();
  iput(p->cwd);
  end_op();
  p->cwd = 0;

把 process 所指向的 cwd ( current working directory ) 釋放
iput
- 會把相對應的 in-memory-struct-inode 的 reference count 減 1
- 當 reference count 歸 0 的時候，會觸發 disk 的寫入
因為 iput 可能會觸發 disk 的寫入，所以這邊仍舊需要 begin_op 以及 end_op 來圈出一段 transaction。

  acquire(&wait_lock);

  // Give any children to init.
  reparent(p);

  // Parent might be sleeping in wait().
  wakeup(p->parent);

acquire(&wait_lock)
- 拿取 global 的 wait_lock
- wait_lock 保護的是 parents 跟 children process 間的關係。當我們想要取用 process-parents-children 相關的資源的時候，都需要拿取這個鎖。
- 這邊拿取這個鎖，是因為等一下想要 reparent
reparent(p);
- 因為當前這個 process 要離去了，所以當前 process 的 children-processes 會失去他的父母 process。所有的 process 都需要有 parent-process 呀! 因為需要父母 process 幫忙釋放 children-process 的資源 ( wait function )。
- 所以這邊需要幫 p 的 children process 找到新的 parent-process。在這裡，會將所有 children-process 託付給一個特殊的 process : initproc ( PID == 1 )。
wakeup(p->parent)
- 會把當前 process 的 parent-process 喚醒，因為 parent 假若呼叫 wait system call，parent 會陷入睡眠 ( sleep )，直到有 child process 要 exit
- 讓 parent-process 釋放自己的資源

  acquire(&p->lock);

  p->xstate = status;
  p->state = ZOMBIE;

  release(&wait_lock);

  // Jump into the scheduler, never to return.
  sched();
  panic("zombie exit");
}

acquire(&p->lock)
- 因為要修改 struct-proc 裡面的狀態，所以需要取鎖。
p->xstate = status
- 把 exit status 儲存起來，當 parent-process 呼叫 wait 來釋放 children-process 的資源的時候，會把這個值交給 parent-process。
- status == 0 : 通常代表成功
- status != 0 : 通常代表 error，或是利用不同的值來表示 error 的原因
p->state = ZOMBIE;
- 代表這個 process 不再可能回到 RUNNABLE，不會被 CPU-scheduler-thread 挑選到。
- 代表這個 process 準備要被 parent-process 回收了。
release(&wait_lock);
- 不會再去修改 parent-children 間的關係，釋放 global wait_lock。
sched();
- sched 會預設 p->lock 已經被拿到了，而在 kexit 也有做這一件事。
- 這邊會 context-switch 到 CPU-scheduler-thread
panic("zombie exit");
- 已經到 ZOMBIE state 的 process 不該再被 CPU-scheduler-thread 挑選到，所以這邊設一個 panic。

kernel/proc.c/kkill

// Kill the process with the given pid.
// The victim won't exit until it tries to return
// to user space (see usertrap() in trap.c).
int
kkill(int pid)

this function
- 這個 function 會嘗試去 kill 特定 PID 的 process，並終止該 process 的運行。
- kill 並不會立刻就把目標 process 關閉，而是會將該 process 的 struct-proc->killed 欄位設為 1，表示這個 process 已經被 kill 掉了。
- 假如被 kill 的 process 正陷入睡眠，會嘗試用 wakeup 去喚醒該 process。
- 目標 process 實際 exit 的時間點，會是從 kernel space 跳回到 user space 的時間點。可以看到 kernel/trap.c/usertrap function 會檢查 killed flag，當 killed-flag == 1，會呼叫 kexit，正式終結該 process 的運行。
(arg) pid
- 想要 kill 的 pid
return value
- 成功 : 0
- 失敗 : -1

  for(p = proc; p < &proc[NPROC]; p++){

一個 for-loop 去遍尋所有的 process。

    acquire(&p->lock);
    if(p->pid == pid){
      p->killed = 1;
      if(p->state == SLEEPING){
        // Wake process from sleep().
        p->state = RUNNABLE;
      }
      release(&p->lock);
      return 0;
    }
    release(&p->lock);
  }
  return -1;
}

假如找到符合 pid 的 process，會設定 p->killed = 1
假如符合的 process 正在睡眠，會直接設定 p->state 回 RUNNABLE，讓這個 process 有機會被 scheduler 挑選並執行。
沒找到符合的 pid，此次 kkill 執行失敗，回傳 -1。

kernel/proc.c/kwait

// Wait for a child process to exit and return its pid.
// Return -1 if this process has no children.
int
kwait(uint64 addr)
{

this function
- 資源回收 : 回收所有結束執行，並進入 ZOMBIE 狀態的 children-process 的資源。
- 假如有 children-process，但沒有等待到 child-process 的話，會陷入睡眠，直到有 child-process 進入 ZOMBIE 狀態。
- 會把等待到的 child-process 的 exit status 回傳到 user space。
(arg) addr
- user space 的 virtual address，會把等待到的 child-process 的 exit status ( child-process->xstate ) 放在這裡。
- 假如這個值是 0 的話，就不會回傳 exit status。
return value
- value == -1 : 該 process 沒有任何 children-process，或是發生其他錯誤。
- value != -1 : 該 process 等待到，並且釋放資源的 child-process 的 PID。

  struct proc *pp;
  int havekids, pid;
  struct proc *p = myproc();

  acquire(&wait_lock);

因為等一下要處理 parent-children process relationship ( e.g. 遍尋 process table 去尋找當前 process 的 children process )，所以需要先拿 global-wait_lock。

  for(;;){
    // Scan through table looking for exited children.

進入一個無窮迴圈，退出的條件為

等待到一個 child-process 的 exit。
當前這個 process 被 kill 掉了。
當前這個 process 沒有任何 child-process。

    havekids = 0;
    for(pp = proc; pp < &proc[NPROC]; pp++){

嘗試去遍尋每一個 process，試圖找到所有的 children-processes

      if(pp->parent == p){
        // make sure the child isn't still in exit() or swtch().
        acquire(&pp->lock);

        havekids = 1;
        if(pp->state == ZOMBIE){
          // Found one.
          pid = pp->pid;
          if(addr != 0 && copyout(p->pagetable, addr, (char *)&pp->xstate,
                                  sizeof(pp->xstate)) < 0) {
            release(&pp->lock);
            release(&wait_lock);
            return -1;
          }
          freeproc(pp);
          release(&pp->lock);
          release(&wait_lock);
          return pid;
        }
        release(&pp->lock);
      }
    }

if(pp->parent == p){
- 遍尋 process table，嘗試尋找自己的 child-process。
acquire(&pp->lock);
- 拿取 child-process 的鎖，一方面是因為接下來可能會存取 struct-proc 內部的資料，另一方面，也可以避免目前該 child-process 仍在某種中間的狀態 ( e.g. swtch 執行到一半 )
havekids = 1
- 表示該 process 是有 child-process 的
if(pp->state == ZOMBIE){
- 該 child-process 的 state 是 ZOMBIE，表示該進行回收作業了 !
- note : 為什麼叫做 ZOMBIE 呢 ? 活死人 ? 因為這個 process 已經結束了他的生命周期 ( 呼叫了 kexit )，但是他的資源 ( e.g. virutal memory pages, trapframe… ) 都還存在著。就像是肉體都還存在，卻沒了靈魂的活死人一樣...
if(addr != 0 && copyout(p->pagetable, addr, (char *)&pp->xstate, sizeof(pp->xstate)) < 0) {
- addr != 0
  - addr != 0 : 表示我們該把 child-process 的 exit status 交給 user-space，需要去呼叫 copyout
  - addr == 0 : 不用去管 exit status。
- copyout
  - 把資料複製到 user space，失敗的話會回傳 val < 0，進而會導致 kwait 這個 function 也失敗，需要回傳 -1。
freeproc(pp);
- 回收該 child-process 的資源
- 會把該 child-process 的 state 改為 UNUSED，可供被再次 allocate。
return pid
- 成功回收 child-process 的資源的話，該 function 會回傳該 child-process 的 pid。

    // No point waiting if we don't have any children.
    if(!havekids || killed(p)){
      release(&wait_lock);
      return -1;
    }

假如該 process 沒有任何 children，或是已經被 kill 掉的話，就直接宣道 kwait 失敗，回傳 -1。

    // Wait for a child to exit.
    sleep(p, &wait_lock);  //DOC: wait-sleep
  }
}

該 process 等不到任何 child-process exit，會陷入睡眠，channel 使用的是自己 struct-proc 的指標
當 child-process 呼叫 kexit 的時候，會用 wakeup(p->parent); 把 parent process 喚醒。

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

79 組

團體組數

2 組

累計文章數

83 篇

最後報名日

9/15

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

ChatGPT Business & Codex 如何從零開始?

IT邦幫忙